9 research outputs found

    The road from manual to automatic semantic indexing of biomedical literature: a 10 years journey

    Get PDF
    Biomedical experts are facing challenges in keeping up with the vast amount of biomedical knowledge published daily. With millions of citations added to databases like MEDLINE/PubMed each year, efficiently accessing relevant information becomes crucial. Traditional term-based searches may lead to irrelevant or missed documents due to homonyms, synonyms, abbreviations, or term mismatch. To address this, semantic search approaches employing predefined concepts with associated synonyms and relations have been used to expand query terms and improve information retrieval. The National Library of Medicine (NLM) plays a significant role in this area, indexing citations in the MEDLINE database with topic descriptors from the Medical Subject Headings (MeSH) thesaurus, enabling advanced semantic search strategies to retrieve relevant citations, despite synonymy, and polysemy of biomedical terms. Over time, advancements in semantic indexing have been made, with Machine Learning facilitating the transition from manual to automatic semantic indexing in the biomedical literature. The paper highlights the journey of this transition, starting with manual semantic indexing and the initial efforts toward automatic indexing. The BioASQ challenge has served as a catalyst in revolutionizing the domain of semantic indexing, further pushing the boundaries of efficient knowledge retrieval in the biomedical field

    Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature based on Weak Supervision

    Full text link
    In this work, we propose a method for the automated refinement of subject annotations in biomedical literature at the level of concepts. Semantic indexing and search of biomedical articles in MEDLINE/PubMed are based on semantic subject annotations with MeSH descriptors that may correspond to several related but distinct biomedical concepts. Such semantic annotations do not adhere to the level of detail available in the domain knowledge and may not be sufficient to fulfil the information needs of experts in the domain. To this end, we propose a new method that uses weak supervision to train a concept annotator on the literature available for a particular disease. We test this method on the MeSH descriptors for two diseases: Alzheimer's Disease and Duchenne Muscular Dystrophy. The results indicate that concept-occurrence is a strong heuristic for automated subject annotation refinement and its use as weak supervision can lead to improved concept-level annotations. The fine-grained semantic annotations can enable more precise literature retrieval, sustain the semantic integration of subject annotations with other domain resources and ease the maintenance of consistent subject annotations, as new more detailed entries are added in the MeSH thesaurus over time.Comment: 36 pages, 8 figures; Dictionary-based baselines added and conclusions update

    iASiS Open Data Graph: Automated Semantic Integration of Disease-Specific Knowledge

    Full text link
    In biomedical research, unified access to up-to-date domain-specific knowledge is crucial, as such knowledge is continuously accumulated in scientific literature and structured resources. Identifying and extracting specific information is a challenging task and computational analysis of knowledge bases can be valuable in this direction. However, for disease-specific analyses researchers often need to compile their own datasets, integrating knowledge from different resources, or reuse existing datasets, that can be out-of-date. In this study, we propose a framework to automatically retrieve and integrate disease-specific knowledge into an up-to-date semantic graph, the iASiS Open Data Graph. This disease-specific semantic graph provides access to knowledge relevant to specific concepts and their individual aspects, in the form of concept relations and attributes. The proposed approach is implemented as an open-source framework and applied to three diseases (Lung Cancer, Dementia, and Duchenne Muscular Dystrophy). Exemplary queries are presented, investigating the potential of this automatically generated semantic graph as a basis for retrieval and analysis of disease-specific knowledge.Comment: 6 pages, 2 figures, accepted in IEEE 33rd International Symposium on Computer Based Medical Systems (CBMS2020

    Large-scale fine-grained semantic indexing of biomedical literature based on weakly-supervised deep learning

    Full text link
    Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors, representing topics of interest for the biomedical community. Several related but distinct biomedical concepts are often grouped together in a single coarse-grained descriptor and are treated as a single topic for semantic indexing. This study proposes a new method for the automated refinement of subject annotations at the level of concepts, investigating deep learning approaches. Lacking labelled data for this task, our method relies on weak supervision based on concept occurrence in the abstract of an article. The proposed approach is evaluated on an extended large-scale retrospective scenario, taking advantage of concepts that eventually become MeSH descriptors, for which annotations become available in MEDLINE/PubMed. The results suggest that concept occurrence is a strong heuristic for automated subject annotation refinement and can be further enhanced when combined with dictionary-based heuristics. In addition, such heuristics can be useful as weak supervision for developing deep learning models that can achieve further improvement in some cases.Comment: 48 pages, 5 figures, 9 tables, 1 algorith

    Overview of BioASQ 2023: The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

    Full text link
    This is an overview of the eleventh edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established tasks b and Synergy, and a new task (MedProcNER) on semantic annotation of clinical content in Spanish with medical procedures, which have a critical role in medical practice. In this edition of BioASQ, 28 competing teams submitted the results of more than 150 distinct systems in total for the three different shared tasks of the challenge. Similarly to previous editions, most of the participating systems achieved competitive performance, suggesting the continuous advancement of the state-of-the-art in the field.Comment: 24 pages, 12 tables, 3 figures. CLEF2023. arXiv admin note: text overlap with arXiv:2210.0685

    Overview of BioASQ 2021-MESINESP track. Evaluation of advance hierarchical classification techniques for scientific literature, patents and clinical trials

    Get PDF
    CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania,There is a pressing need to exploit recent advances in natural language processing technologies, in particular language models and deep learning approaches, to enable improved retrieval, classification and ultimately access to information contained in multiple, heterogeneous types of documents. This is particularly true for the field of biomedicine and clinical research, where medical experts and scientists need to carry out complex search queries against a variety of document collections, including literature, patents, clinical trials or other kind of content like EHRs. Indexing documents with structured controlled vocabularies used for semantic search engines and query expansion purposes is a critical task for enabling sophisticated user queries and even cross-language retrieval. Due to the complexity of the medical domain and the use of very large hierarchical indexing terminologies, implementing efficient automatic systems to aid manual indexing is extremely difficult. This paper provides a summary of the MESINESP task results on medical semantic indexing in Spanish (BioASQ/ CLEF 2021 Challenge). MESINESP was carried out in direct collaboration with literature content databases and medical indexing experts using the DeCS vocabulary, a similar resource as MeSH terms. Seven participating teams used advanced technologies including extreme multilabel classification and deep language models to solve this challenge which can be viewed as a multi-label classification problem. MESINESP resources, we have released a Gold Standard collection of 243,000 documents with a total of 2179 manual annotations divided in train, development and test subsets covering literature, patents as well as clinical trial summaries, under a cross-genre training and data labeling scenario. Manual indexing of the evaluation subsets was carried out by three independent experts using a specially developed indexing interface called ASIT. Additionally, we have published a collection of large-scale automatic semantic annotations based on NER systems of these documents with mentions of drugs/medications (170,000), symptoms (137,000), diseases (840,000) and clinical procedures (415,000). In addition to a summary of the used technologies by the teams, this paperS

    BioASQ at CLEF2022: the tenth edition of the large-scale biomedical semantic indexing and question answering challenge

    No full text
    The tenth version of the BioASQ Challenge will be held as an evaluation Lab within CLEF2022. The motivation driving BioASQ is the continuous advancement of approaches and tools to meet the need for efficient and precise access to the ever-increasing biomedical knowledge. In this direction, a series of annual challenges are organized, in the fields of large-scale biomedical semantic indexing and question answering, formulating specific shared-tasks in alignment with the real needs of the biomedical experts. These shared-tasks and their accompanying benchmark datasets provide an unique common testbed for investigating and comparing new approaches developed by distinct teams around the world for identifying and accessing biomedical information. In particular, the BioASQ Challenge consists of shared-tasks in two complementary directions: (a) the automated indexing of large volumes of unlabelled biomedical documents, primarily scientific publications, with biomedical concepts, (b) the automated retrieval of relevant material for biomedical questions and the generation of comprehensible answers. In the first direction on semantic indexing, two shared-tasks are organized for English and Spanish content respectively, the latter considering human-interpretable evidence extraction (NER and concept linking) as well. In the second direction, two shared-tasks are organized as well, one for biomedical question answering and one particularly focusing on the developing issue of COVID-19. As BioASQ rewards the approaches that manage to outperform the state of the art in these shared-tasks, the research frontier is pushed towards ensuring that the valuable biomedical knowledge will be identifiable and accessible by the biomedical experts.Google was a proud sponsor of the BioASQ Challenge in 2021. The tenth edition of BioASQ is also sponsored by Atypon Systems inc. The DisTEMIST task is supported by the Spanish Plan for the Advancement of Language Technologies (Plan TL), the 2020 Proyectos de I+D+i-RTI Tipo A (Descifrando El Papel De Las Profesiones En La Salud De Los Pacientes A Traves De La Mineria De Textos, PID2020-119266RA-I00), and HORIZON-CL4-2021-RESILIENCE-01 (BIOMAT+, 101058779).Peer ReviewedPostprint (author's final draft

    BioASQ at CLEF2023: The Eleventh Edition of the Large-Scale Biomedical Semantic Indexing and Question Answering Challenge

    No full text
    The large-scale biomedical semantic indexing and question-answering challenge (BioASQ) aims at the continuous advancement of methods and tools to meet the need of biomedical researchers and practitioners for efficient and precise access to the ever-increasing resources of their domain. With this purpose, during the last ten years a series of annual challenges have been organized with specific shared tasks on large-scale biomedical semantic indexing and question answering. Benchmark datasets have been concomitantly provided in alignment with the real needs of biomedical experts. BioASQ provides a unique common testbed where different teams around the world can investigate and compare new approaches for identifying and accessing biomedical knowledge. The eleventh version of the BioASQ Challenge will be held as an evaluation Lab within CLEF2023. In this version, three shared tasks will be presented: (i) the automated retrieval of relevant material for biomedical questions, and the generation of comprehensible answers. (ii) the synergistic retrieval of relevant material and generation of answers for open biomedical questions about developing topics, in collaboration with the experts posing the questions. (iii) the automated indexing of unlabelled clinical procedures-specific medical documents, primarily clinical case reports written in Spanish, with biomedical concepts and the extraction of human-interpretable evidence. As BioASQ rewards the methods that outperform the state of the art in these shared tasks, it pushes the research frontier towards approaches that accelerate access to biomedical knowledge.Google was a proud sponsor of the BioASQ Challenge in 2022. The eleventh edition of BioASQ is also sponsored by Atypon Systems inc. The task Med- ProcNER is supported by the Spanish Plan for the Advancement of Language Technologies (Plan TL), the 2020 Proyectos de I+D+i-RTI Tipo A (Descifrando El Papel De Las Profesiones En La Salud De Los Pacientes A Traves De La Mineria De Textos, PID2020-119266RA-I00). This project has received funding from the European Union Horizon Europe Coordination and Support Action under Grant Agreement No 101058779 (BIOMATDB) and DataTools4Heart - DT4H, Grant agreement No 101057849.Peer ReviewedPostprint (author's final draft
    corecore